Exploiting Partial Information in Taxonomy Construction
نویسندگان
چکیده
One of the core services provided by description logic (DL) reasoners is classification: determining the subsumption quasi-ordering over the concept names occurring in a knowledge base (KB) and caching this information in the form of a directed acyclic graph known as the concept hierarchy or taxonomy. For less expressive DLs, such as members of the EL family, it may be possible to derive all the relevant subsumption relationships in a single computation [Baader et al., 2005]. In general, however, it will be necessary to “deduce” the subsumption relation by performing individual subsumption tests between pairs of concept names. For n concept names this will, in the worst case, require n tests, but for the tree-shaped hierarchies typically found in realistic KBs much better results can be achieved using algorithms that construct the taxonomy incrementally by traversing the partially-constructed taxonomy in order to find the right place to insert each concept name. This kind of algorithm suffers from two main difficulties. First, individual subsumption tests can be computationally expensive—for some complex KBs, even state-of-theart reasoners may take a long time to perform a single test. Second, even when subsumption tests themselves are very fast, a knowledge base containing a very large number of concepts1 will obviously result in a very large taxonomy, and repeatedly traversing this structure can be costly. The first difficulty is usually addressed by using an optimized construction that tries to minimize the number of subsumption tests performed in order to deduce the subsumption relation. Most implemented systems use an “enhanced traversal” algorithm due to Ellis [1991] and to Baader et al. [1994] which adds concepts to the taxonomy one at a time using a two-phase top-down and bottom-up breadth-first search of the partially-constructed taxonomy. The algorithm exploits the structure of the KB to identify “obvious” subsumers (so-called told-subsumers) of each concept, and uses this information in a heuristic that chooses the order in which concepts are added, the goal being to construct the taxonomy top-down; it also exploits information from the topdown search in order to prune the bottom-up search.2 The second difficulty can be addressed by optimizations that try to identify a subset of the concepts for which complete information about the subsumption relation can be
منابع مشابه
Taxonomy Construction Techniques – Issues and Challenges
For any information to be organized, taxonomy is essential. Taxonomy plays a very important role for information and content management. Also it helps in searching of content. The most common method for constructing taxonomy was the manual construction. As the information available today is huge, constructing taxonomy for such information manually was time consuming and maintenance was difficul...
متن کاملHierarchical Taxonomy Extraction by Mining Topical Query Sessions
Search engine logs store detailed information on Web users interactions. Thus, as more and more people use search engines on a daily basis, important trails of users common knowledge are being recorded in those files. Previous research has shown that it is possible to extract concept taxonomies from full text documents, while other scholars have proposed methods to obtain similar queries from q...
متن کاملUsing Taxonomic Background Knowledge in Propositionalization and Rule Learning
Knowledge representations using semantic web technologies often provide information which translates to explicit term and predicate taxonomies in relational learning. Here we show how to speed up the process of propositionalization of relational data by orders of magnitude, by exploiting such ontologies through a novel refinement operator used in the construction of conjunctive relational featu...
متن کاملResolving Task Specification and Path Inconsistency in Taxonomy Construction
Taxonomies, such as Library of Congress Subject Headings and Open Directory Project, are widely used to support browsing-style information access in document collections. We call them browsing taxonomies. Most existing browsing taxonomies are manually constructed thus they could not easily adapt to arbitrary document collections. In this paper, we investigate both automatic and interactive tech...
متن کاملSIFT: An Algorithm for Extracting Structural Information From Taxonomies
In this work we present SIFT, a 3-step algorithm for the analysis of the structural information represented by means of a taxonomy. The major advantage of this algorithm is the capability to leverage the information inherent to the hierarchical structures of taxonomies to infer correspondences which can allow to merge them in a later step. This method is particular relevant in scenarios where t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009